Regular Expressions: The Ultimate in Lack of Redundancy

Mark Leighton Fisher on 2009-09-23T18:12:58

The concepts in regular expressions are simple -- one of anything, a numeric character, a class of characters -- so why do so many people have problems with regular expressions? I think it is the lack of redundancy.

Each concept in regular expressions is expressed in 1 or 2 characters -- "." is one of anything, "*" is one or more of the preceeding thing, and so on. Compare this with C, where matching against a 'b' in a string could be coded compactly as:

match = 0;
while (*c++) {
    if (*c == 'b'){
        match = 1;
        break;
    }
}

Although a modern language would cut down the size of that code, it still wouldn't come close to the one character of the corresponding regular expression. And therein lies the problem – we humans rely on redundancy when interpreting information. In theory, you would never need more than one character to represent any concept in a computer language. Yet, the programming languages that gain general popularity are languages with some amount of redundancy built in -- Java, Perl, Python, C# -- the list goes on and on. Given that we’ve had minimally-redundant programming languages since programming languages were first conceived of in the 1950’s (APL, anyone?), if minimally-redundant programming languages were going to take over the world, it would have happened by now -- and it hasn't happened.

Another example of our need for redundancy is in driving directions. The best driving directions always contain some redundancy -- "you turn at Capitol, which is between Senate and Illinois" -- instead of "you turn at Capitol", as "you turn at Capitol" gives you no idea of where Capitol actually is -- it could be several miles down the road, or 1 block after the previous turn. (I have wondered if giving good directions is a skill similar to that of programming.)

Reading may be another example -- you can usually get the gist of a paragraph of English text even when only the first and last letters of each word are in their right places (thereby demonstrating that the other letters are mostly redundant).

Music is possibly another example of the human need for redundancy. Whether it is the de-de-de-dah motif of Beethoven's 5th symphony, the distinctive drum line of Led Zeppelin's "Immigrant Song", or the chorus of Green Day's "21 Guns", music relies on redundancy through repetition. In theory, you should only need to hear each part of a song once to derive full musical enjoyment from the song. But instead, in Beethoven's 5th symphony (where there are no words to require musical backing) Beethoven repeats the motif over and over again. And Beethoven's 5th symphony is widely regarded as one of the crowning achievements of music -- yet it is filled with redundancy through repetition, although there is no theoretical reason for that level of repetition. Or is there?

Truth is, we humans need a certain level of redundancy in our information before a concept is firmly planted in our heads, whether it is a popular song or the clauses of our national constitution. The reason that I and so many others have found such success with the Head First book series is because Head First's use of redundancy (presenting each piece of information in several different ways) helps to ensure that you retain the information in the Head First books.

Perl's "/x" modifier may turn out as one of the most significant advances in regular expression syntax, because /x enables the splitting-up and commenting of your regular expressions -- operations that increase the readability and redundancy of your regular expressions (redundancy because the parts of your regular expressions are now represented by a whitespace-bounded line of text instead of just the regular expression characters (in the common case)).

(Why we humans need all this redundancy is better left to another day, although I will give you a hint: why do humans still have appendixes?)


repetition, patterns

slanning on 2009-09-24T21:02:08

I think repetition and redundancy are different, though something that repeats can be redundant.

We humans are relatively good at pattern matching. We recognize patterns in music, even though what we actually perceive can be quite different. We can recognize Beethoven's 5th whether it's played on violins, a midi synthesizer, or even somebody humming or whistling it, though the sound waves inpinging on our eardrums are quite different. We recognize it in C minor or G major. We recognize it slower or faster or when the speed is varied. We can recognize a trance remix of it. You might even recognize it visually, or maybe by someone tapping it on you or by sucking on an electric lollipop. Well, I forgot my point, so I'll have to re-read your post. ^.^